Aging and species abundance in the Tangled Nature model of biological evolution. 
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We present an individual based model of evolutionary ecology. The reproduction rate of indi- 
viduals characterized by their genome depends on the composition of the population in genotype 
space. Ecological features such as the taxonomy and the macro-evolutionary mode of the dynam- 
ics are emergent properties. The macro-dynamics exhibit intermittent two mode switching with 
a gradually decreasing extinction rate. The generated ecologies become gradually better adapted 
as well as more complex in a collective sense. The form of the species abundance curve compares 
well with observed functional forms. The models error threshold can be understood in terms of the 
characteristics of the two dynamical modes of the system. 
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I. INTRODUCTION 

The dynamics and organization of biological ecosys- 
tems is a fascinating example of complex interacting sys- 
tems with many levels of emerging structure and time 
scales. Biological evolution creates intricate taxonomic 
hierarchies presumably as an effect of mutation, natural 
selection and the ensuing adaptation. Taxonomic struc- 
tures from the level of individuals through species and 
genera up to kingdoms are generated and vanish again in 
a never ending succession. Different strata in the hierar- 
chy are described by very different timescales and with 
very different types of dynamics. At the level of indi- 
viduals, fairly well defined characteristic lifetimes exist 
for each specific type (species) and the population dy- 
namics can be considered smooth. This picture changes 
as one considers the system at the more coarse grained 
level of species and genera. The lifetime distribution of, 
e.g., genera is broad (see e.g. jjj) and the dynamics is 
intermittent fi 0, 0, E|. In the spirit of the traditional 
approach of statistical mechanics it is interesting to con- 
sider models, defined at a microscopic level, which are 
able to reproduce the large scale temporal and taxonomic 
structures. 

In the present paper we consider a model of individuals 
identified solely by their genome. The model was intro- 
duced in Ref. || where we also presented a discussion of 
the qualitative behavior of the model. We combine ecol- 
ogy with evolution by considering interacting individuals 
which can multiply (sexually or asexually) subject, po- 
tentially, to mutations. The size of the total population 
fluctuates, the average being controlled by the amount of 
available resources. From these three minimal ingredients 
emerge segregation in genome space, to be interpreted as 
the appearance of species, and a complex intermittent 
dynamics, to be interpreted as extinction and creation 
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events at the higher taxonomic levels. The entire taxo- 
nomic hierarchy is an emergent property of the dynamics 
at the microscopic level of individuals. We characterize 
the configurations generated in genotype space in terms 
of the species abundance curve, and find a good qualita- 
tive agreement with the functional form typically found 
for real ecosystems. The intermittent dynamics is charac- 
terized by the statistics of the duration of the quasi-stable 
epochs or in other words the waiting times between tran- 
sitions. We find a broad distribution of durations and 
observe a gradual aging of the macro-dynamics. No sta- 
tionary state is ever reached. 



A. Related Models 

Many mathematical models of biological evolution 
have been developed according to the usual statistical 
mechanics agenda of generating the macroscopic complex 
behavior from simplistic microscopic definitions. An ele- 
gant review of this endeavor has recently been given by 
Drossel §. Here we limit ourselves to a discussion of 
similarities and differences between our model and re- 
lated studies. 

Let us first mention models which define the ecosys- 
tem in terms of individuals. Higgs and Derrida Q stud- 
ied speciation in a model consisting of a fixed number of 
individuals. Each individual is represented by a genome 
modeled as a string of zeros or ones, like in Eigen and 
collaborators' seminal work on quasi-species |J. Higgs 
and Derrida demonstrated that a sexually reproducing 
population breaks up into distinct species when only in- 
dividuals with a sufficiently similar genome sequence are 
allowed to produce offspring. This agrees with a large 
bulk of experimental work |l(j] . Gavrilets and collabora- 
tors |n], [l2| [l3| have made use of similar models gener- 
alized in particular to be able to study geographical and 
temporal aspects of speciation. These studies differ from 
ours in assuming a fixed population size and by defining 
a fitness function for pairs of individuals which is con- 
stant if the Hamming distance between the genomes of 
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two individuals is small enough, and zero otherwise. Our 
model allows the total size of the population to fluctuate 
and the fitness of pairs of individuals (or in the asexual 
case single individuals) depends on the composition of 
the population at a given instant in time. 

It is also important to mention the fitness landscape 
approach first pioneered by Wright [jl4|, |l5) , who consid- 
ered gene frequencies, and was brought to the attention 
of the statistical mechanics community mainly through 
Kauffman's so called NK model [[l6j [T^j. The main focus 
of the NK model, and of the later co-evolutionary NKC 
model @ , is the study of epistatic interactions (the in- 
fluence of one gene on another) by use of fitness functions. 
The main difference between our model and Kauffman's 
models is that the fitness of an individual in our system 
depends on the frequencies by which other locations in 
genotype space are occupied. 

Taylor and Higgs [[l8) have studied pleiotropy and epis- 
tasis (the influence of one gene on several traits and 
the influence of one gene on another) in a model that 
combines and generalizes aspects of the Higgs-Derrida 
model with the epistatic interactions of Kauffman's mod- 
els. Taylor and Higgs then derive a phenotypical fitness 
for the specific genotype. Kaneko and Yomo JTq| have 
also studied models in which the difference between phc- 
notype and genotype is accounted for explicitly. In our 
model we make the drastic simplification not to distin- 
guish between genotype and phenotype. 

Other models consider species as the elementary build- 
ing block; these models neglect the specifics of the dy- 
namics arising from reproduction and mutations at the 
level of individuals. The simplest of these models is the 
Bak-Sneppen model ppfl . The model aims to demon- 
strate that co-evolutionary interactions are sufficient to 
produce intermittent dynamics which is then related to 
intermittency in the fossil record and to Eldredge and 
Gould's concept of punctuated equilibrium [| ||. 
Each species is characterized by a single number between 
zero and one, the fitness, and the total number of species 
is kept constant. The model has interesting statistical 
properties but is difficult to relate to biological evolution. 

Species level models of more detail than the Bak- 
Sneppen model have been formulated recently by McK- 
ane, Alonso and Sole and by Drossel, Higgs and McK- 
ane [^2|. The emphasis in these models is on predator- 
prey interactions and food-webs and are generalizations 
of early work by May J23| and May and Anderson |p4| . 
Our model is intended to include all types of interactions 
between individuals, e.g. antagonistic or collaborative 
relationships, in addition to predator-prey competitions. 
Another important difference is that we define our model 
at the level of individuals in order to be able to study the 
emergence of species, something not possible in a species 
based model. 

Most models of biological evolution assume that the 
dynamics is in a statistically stationary state. One 
marked exception is the model considered by Sibani and 
collaborators |25|, |2l, |27|, |2l| . This is an abstract species 



based model consisting of random walks in a rugged fit- 
ness landscape. The statistics of the jumps in this land- 
scape are the same as the record statistics considered 
some time ago by Sibani and Littlewood |2S|] . The pace 
of the dynamics of the model gradually slows down as in- 
dicated by a logarithmically decreasing extinction rate. 
As we shall see below our individual based model also 
exhibits aging, a property found to be consistent with 
analysis of the fossil record jl| . 

The paper is organized as follows. In the next section 
we define the model in detail. In Sec. Ill we discuss 



the modes of the models emergent dynamics. In Sec. IV 
we show how the configurations generated dynamically 
gradually become better adapted in a collective sense. 
Sec. |v| demonstrates that the ecologies generated in the 
model exhibit characteristics similar to those observed in 
real ecologies. In Sec. [vj we discuss the models ability to 
address the issue concerning why sexual reproduction can 
compete evolutionary with asexual reproduction. Sec. 
VII contains an analy sis of the error threshold. We briefly 
present in Sec. VIII a scan of the behavior of the model 
for a range of the control parameters and in Sec. IX we 
conclude and summarize. 



II. DEFINITION OF MODEL 

We describe here in detail the structure and dynam- 
ics of the model which we, with an allusion to Darwin's 
notion of the Tangled Bank, called the Tangled Nature 
(TaNa) model to stress the model's emphasis on ecolog- 
ical interactions Q . 



A. Interaction 

We represent an individual by a vector S c 
(Si , S%,...,Sl) in genotype space S. This representation 
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is frequently used, see e.g. Ref. 

S" may take the values ±1, i.e. S a denotes one of the 
corners of the L dimensional hypercube (in the present 
paper we use L = 20). The coordinates «Sf may be in- 
terpreted as genes with two alleles, or a string of cither 
pyrimidines or purines. We think of genotype space iS 
as containing all possible ways of combining the genomic 
building blocks into genome sequences. Many sequences 
may not correspond to viable organisms. Whether this 
is the case or not is for the evolutionary dynamics to de- 
termine. All possible sequences are made available for 
evolution to select from. 

Individuals are labeled by Greek letters a, (3, ... = 
1, 2, N(t). When we refer, without reference to 
a specific individual, to one of the 2 L positions in 
genome space, we use roman superscripts S a , S b , ... 
with a, 6,... = 1,2, ...,2 . Many different individuals 
S Q ,S^,..., may reside on the same position, say S a , in 
S. 



The ability of an individual a to reproduce is controlled 
by H(S a ,t): 

^ (SV) = ~c~m 2 J(SQ ' s)n(s ' <} - (1) 

where c is a control parameter (see below), N(t) is the 
total number of individuals at time t, the sum is over 
the 2 L locations S in S and n(S,t) is the occupancy of 
position S. Two positions S a and S b in genome space are 
coupled with the fixed random strength J ab = J(S a , S b ) 
which can be either positive or negative or zero. The 
coupling is non-zero with probability O (throughout the 
paper we use = 0.25), in which case we assume J ab ^ 
J ba to be a deterministic but erratic function of the two 
positions S a and S b . We have checked that the specific 
details of the form of the distribution of the non-zero 
values of the function J(S a , S b ) are irrelevant. We choose 
accordingly a form mainly determined by its numerical 
efficiency. In the next subsection we describe the details 
of the specific procedure used. The distribution of the 
generated interaction strengths is shown in Fig. below. 
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FIG. 1: Examples of possible realizations of the couplings 
J a between different positions S a and S b in genotype space 
representing collaborative (+,+), antagonistic (-,-) or preda- 
tor prey (+,-) & (">+) relationships. 



1. Generation of interaction matrix 

The interaction between two locations in genotype 
space, S a and S b is generated as a product J(S a ,S b ) = 
8(S Q , S b )/(S a , S 6 ). The first factor 6(S Q , S b ) is obtained 
by interpreting the sequences S a and S b as binary num- 
bers (letting — 1 i— > 0) and perform the XOR operation 
on the binary pair to obtain a new integer. This integer 
is used as an index in a lookup list to obtain either a 
or 1 as the value of <d(S a , S b ) . In case 1 is returned, the 
element of the /(S a , S b ) matrix is obtained in a similar 
way. This time, however, two arrays are needed. Each 
auxiliary array is of length 2 L and now the arrays con- 
tain uniformly distributed random numbers drawn from 
the interval [—1, +1]. The pair of arrays is necessary in 
order to reproduce the asymmetry of the I(S a ,S b ) ma- 
trix. Two indices are generated from the S a and S b . The 
first via the same XOR operation used to calculate the 
6(S a , S b ) matrix element, whereas the second is simply 
the integer representing S b . The strength of interaction 
is taken to be the product of the members of each ar- 
ray at the appropriate location. This ensures that the 
elements of the matrices are non-symmetric due to the 
second array index depending on the order of the oper- 
ation. This procedure is numerically extremely efficient 
and deterministic, but has the side effect of generating a 
distribution of a slightly unusual form, see Fig. [| 

We stress that the coupling matrix J(S a , S b ) is meant 
to included all possible interactions between two individ- 
uals of a given genomic constitution. In our simplistic 
approach, a given genome is imagined to lead uniquely 
to a certain set of attributes (phenotype) of the indi- 
viduals/organisms. The locations S a and S b represent 
blueprints for organisms that exist in potentia. The po- 
sitions may very likely be unoccupied but, if we were 



to construct individuals according to the sequences S a 
and S b the two individuals would have some specific fea- 
tures. The relationship between an organism of design 
S a and one of design S b may be as predator and prey 
or parasitic, i.e. J ab > and J ba < 0, but it can also 
be collaborative (J ab > and J ba > 0) or antagonis- 
tic (J ab < and J ba < 0), see Fig. 0. And certainly 
in some cases J ab may represent less direct couplings, 
e.g. some animals may not eat trees, nevertheless they 
breath the oxygen produced by the rain forest. In or- 
der to emphasis co-evolutionary aspects we have excluded 
"self-interaction" among individuals located at the same 
position S in genome space, i.e. J(S, S) = for all S € <S. 
It is important to mention that including self-interactions 
of the same order of strength as the J-couplings does not 
change the qualitative behavior of the model. 

The conditions of the physical environment are simplis- 
tically described by the term /LtiV(i) in Eq. (1), where [i 
determines the average sustainable total population size. 
That is, the total carrying capacity of the environment. 
An increase in fj, corresponds to harsher physical con- 
ditions. This is a simplification, though one should re- 
member that what is often considered as the physical 
conditions, e.g. temperature or oxygen density, is to a 
degree determined by the activity of other organisms and 
is therefore really a part of the biotic conditions. Con- 
sider, for example, the environment experienced by the 
bacterial flora in the intestines. Here one type of bacte- 
ria live very much in an environment strongly influenced 
by the presence of other types of bacteria. In this sense 
some fluctuations in the environment may be thought of 
as included in the coupling matrix J(S a , S b ). 
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B. Reproduction, mutations and annihilation 

Asexual reproduction consists of one individual being 
replaced by two copies. Successful reproduction occurs 
for individuals S a with a probability per time unit given 
by 



Poff(S a ,t) = 



exp[ff(SV)] 
l + exp[tf(S«,i)] 



€[0,1]. 



(2) 



In the case of sexual reproduction an individual S" 
is picked at random and paired with another ran- 
domly chosen individual S* 3 with Hamming distance d = 

l£i=i \&i ~ &i\ - dmax (allowing at most d max pairs 
of genes to differ) . The pair produces an offspring 7 with 
probability ^/p //(S Q , i)p ff(SP, t), where is chosen 
at random from one of the two parent genes, either S" 
or Sf ' . For d max > 1 this procedure may be thought of 
as being similar to recombination. The maximum sepa- 
ration criterion has been studied by several authors, see 
e.g. 

We allow for mutations in the following way: with 
probability p mu t per gene we perform a change of sign 
57 — > —S?, during the reproduction process. 

For simplicity, an individual is removed from the sys- 
tem with a constant probability pkm per time step (we 
use pkm = 0.2). This procedure is implemented both for 
asexual and sexually reproducing individuals. 

A time step consists of one annihilation attempt fol- 
lowed by one reproduction attempt. One generation con- 
sists of N(t)/pkui time steps, which is the average time 
taken to kill all currently living individuals. 

Initially we place N(0) — 500 individuals at randomly 
chosen positions. The results are independent of initial 
conditions. We obtain the same results if all individuals 
are located at the same position initially. 

The present paper's main focus is on the asexual mode 
of reproduction and results presented are for asexual in- 
dividuals except otherwise stated. For completeness and 
for comparison we do, however, in Sees. VI and VIII 



consider sexual reproduction, simplicity which is defined 
as follows. 



III. 



DYNAMICAL STABILITY 



Neglecting fluctuations in the occupancy n(S,t) the 
above dynamics is described by the following set of equa- 
tions (one equation for each position in the genotype 
space): 



n(S,t + 1) = n(S,i) 

+ {p of f(S,t)[2(l -p m ut) L - 1] -Pkill} 



n(S,t) 



2 Pmu t(l-Pmut) L - 1 £ p off (S',t)^^-, (3) 
<S',S) ^ ' 
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FIG. 2: The occupation in genotype space plotted as a func- 
tion of generation time. The genotypes are enumerated in 
an arbitrary manner. If a position is occupied at a given 
moment in time a dot is placed at the corresponding num- 
ber along the y-axis at that instant in time. Parameters are 
c = 0.5, fi = 0.005 and Lp mut = 0.25. 



where the sum is over the nearest neighbors of S. Station- 
ary solutions require the system to find configurations in 
genotype space for which all positions satisfy the demand 
that either n(S,t) — or if n(S,t) ^ (neglecting the 
mutational back flow represented by the last term in Eq. 
(ft)), we must have 



Poff 



Pkill 



2{l-p mut ) L 



Pq-ESS 



(4) 



The fitness p Q ff(S a ,t) of individuals at a position S a de- 
pends on the occupancy n(S fc ,t) of all the sites S b with 
which site S a is connected through couplings J ab . Ac- 
cordingly, a small perturbation in the occupancy at one 
position may be able to disturb the balance in Eq. (jij) 
between p a ff(S, t), pkm and p m ut on connected sites. In 
this way an imbalance at one site can spread as a chain 
reaction through the system, possibly causing a global 
reconfiguration of the occupancy in genotype space. 

We show in Fig. || the occupancy in genotype space 
plotted as a function of time for asexual reproduction. 
Periods of stable configurations are separated by fast 
transitions. We have called the stable periods "quasi- 
Evolutionary Stable Strategies" or q-ESS since they are 
reminiscent of the Evolutionary Stable Strategies (ESS) 
introduced by Maynard Smith pi] ]. 

It is interesting to investigate just how stable the q- 
ESS are. We have done this by applying different types 
of perturbations in the q-ESS. The result is that the q- 
ESS are very stable against global perturbations such as 
a brief or a lasting increase in control parameters fj,, c or 
Pkm- Changes of up to 50% in these parameters, cither 
permanently or for a period of 100 generations, only effect 
the total population size and is typically not able to kick 
the population out of its present q-ESS configuration in 
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genotype space. In contrast, a similar perturbation of the 
mutation rate easily destabilizes the q-ESS configuration. 

We stress that the segregation (or speciation) to be 
discussed below is an effect of different couplings be- 
tween different positions S a and S b . When we assume 
J(S a ,S b ) = J independent of S a and S b , the popula- 
tion is not concentrated around a subset of the positions 
in genotype space, instead the population is smeared out 
through the space in a diffuse manner. Self-interaction, 
however, can cause segregation in a rather trivial way. 
Namely, if we include a distribution of J(S, S) values, 
segregation may occur even in the case where all inter- 
action terms assume the same value: J(S a , S b ) = Jo for 
S a ^ S b . However, this type of selection of configura- 
tions in genotype space is not very interesting since the 
sites to become occupied is determined by the arbitrarily 
assigned self-interactions J(S,S) and not by the collec- 
tive dynamical adaptation at play when J(S, S) = and 
J(S a ,S b ) assumes a distribution of different J- values. 
In reality one will expect the selection of species to be 
caused by a mixture of self-interaction and interaction 
between different species. To decide which one is domi- 
nant might be difficult and will certainly be system spe- 
cific. 

There is a significant difference between the distribu- 
tion of active couplings, p act (J(S a , S h )), in the q-ESS and 
the distribution during the hectic transitions. We show 
in Fig. U the distribution from which the Jf, ore (S a , S b ) 
are sampled together with the distribution of couplings 
between occupied sites after a large number of genera- 
tions. During the hectic phases there is clearly no no- 
ticeable difference between the "bare" distribution of the 
J(S a ,S b ) and the distribution of active couplings, i.e. 
couplings between occupied positions. During the q-ESS 
we observe a slight bias towards positive J-values of the 
active couplings. This slight shift towards more positive 
couplings will, according to Eqs. [l| and ||, lead to an 
increased reproduction rate during the q-ESS. The man- 
ifestation of this difference between the hectic periods 
and the q-ESS is illustrated in Fig. We see that the 
distribution of iJ-values in the q-ESS during both modes 
of the dynamics contains a narrow peak. In the q-ESS, 
the peak in p(H ) is separated from a strongly negative 
band of support. The values of H in this band are so neg- 
ative that the corresponding p Q ff (H) are negligible (see 
the insert in Fig. ^J). Genotype positions corresponding 
to this band consist of unfit positions next to highly oc- 
cupied and very fit positions. The reason these positions 
are occupied at all is that they are supplied by mutations 
occurring on the neighboring fit positions. The conclu- 
sion of these considerations is that the dynamics during 
the q-ESS as well as during the hectic periods are con- 
trolled by the reproduction of individuals with iJ-values 
in the two respective peaks of p(H). 

The location of the peaks of p(H) is determined in 
the following way. During the hectic periods the occu- 
pation of positions in genotype space is highly unstable 
and n(S,t + 1) is only related to n(S,t) in an erratic 
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FIG. 3: The distribution from which the values of the 
couplings J(S a ,S 6 ) are draw at the start of the simulations 
(dashed curve) together with the probability density function 
of the couplings between occupied sites (solid curve) during 
the hectic periods (top panel) and during the q-ESS (bottom 
panel). Parameters are c = 0.01, /j, = 0.01 and Lp mu t = 0.2. 
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FIG. 4: The probability density function for the weight 
function H (main frame) and reproduction rates Poff (in- 
sert) during the hectic transitions (dashed curve) and in the 
q-ESS (solid curve). Parameters are c = 0.08, /i = 0.005 and 
Lpmut =0.25. 



way, the balance equation Eq. (g) is never fulfilled for 
nonzero n(S,t) = 0. The only constraint on p fj during 
the hectic periods is accordingly that the total popula- 
tion remains constant on average which implies that on 
average p ff = Pkui ■ This explains why in Fig. ^ the 
peak in p{H) during the hectic periods corresponds to a 
peak in p(p ff) centered at Pkui = 0.2. The situation is 
different during the q-ESS. Here the occupation of the se- 
lected positions in genotype space remains approximately 
constant and Eq. (||) applies. Substituting the relevant 
values pkui — 0.2, p mut — 0.0125 and L = 20 into Eq. 
(^) produces Poff = 0.36 which explains the position of 
the peak in p(p ff) during the q-ESS. 
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FIG. 5: The average number of transitions during a window 
of size T = 1000 generations as a function of generation time. 
Parameters are c = 0.01, /j, = 0.01 and Lp mu t = 0.2. Tthe 
average is over 400 realizations. 
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FIG. 6: The ensemble averaged total population (top) and 
diversity (bottom) as function of generation time for the same 
ensemble as in Fig. S 



IV. 



AGING 



For simplicity we concentrate again on the asexual 
model in this section. For large genome length L the sys- 
tem is always in a transient. The time needed to reach 
the stationary state increases exponentially with L and is 
therefore unreachable for any biologically relevant values 
of L. 



A. Increasing q-ESS durations 

The gradual change in the statistical measures of the 
model is seen directly as a slow increase with time of the 
average duration of the q-ESS. To demonstrate this we 
show in Fig. [| the average number of transitions O^i) 
between q-ESS within a time window of fixed size T as a 
function of time t measured in number of generations. It 
is clear that Clrit) decreases with increasing t, however 
it is very difficult to obtain sufficient statistics to be able 
to determine the functional dependence of fir(t) ° n t, 
though a very slow exponential t dependence is suggested 
by Fig. ||. Despite these sampling difficulties, it is evident 
that the duration of the q-ESS, on average, increases with 
time. This corresponds to a decrease in the extinction 
rate, consistent with analysis of the fossil record 0. 



B. Increasing population size, diversity and 
complexity 

The gradual growth of the duration of the stable q- 
ESS epochs indicates that the dynamics of the system is 
able to produce more stable or better adapted configu- 
rations in genotype space. It is difficult to test quanti- 
tatively the stability of the q-ESS with respect to per- 
turbations. That the population is distributed in an in- 



creasingly more efficient manner in genotype space can 
be seen directly from the increase in the total population 
size N(i) averaged over an ensemble of different realiza- 
tions of the stochastic elements of the dynamics. Fig. 
^ contains the average total population (N(t)) together 
with the ensemble average of the diversity (D(t)), where 
D(t) is defined as the number of different occupied posi- 
tions S in genotype space at time t. The average diversity 
also increases with time. 

Let us briefly consider how the total population size 
can increase. We saw in Sec. Ill that essentially p a ff 
is narrowly distributed either about Pkui or, as in the q- 
ESS, about the value p q -ESS in Eq. (|J). The increase 
in N(t) is therefore not an effect of a gradual increase 
in Poff- Simulations indeed confirm that the average 
offspring probability always remains constant over the 
entire run. In biological observations and experiments, 
for reasons of uniqueness, the reproduction rate is iden- 
tified as fitness. In this sense the fitness of the individ- 
uals remains, on average, constant in the TaNa model, 
as presumably is also the case in biological macroevo- 
lution; though the microbial experiments by Lenski |3^ ] 
demonstrate that the reproductive fitness can increase as 
a result of adaptation in microevolution. 

The increase in the average population size (N(t)) ob- 
served in the TaNa model is caused by the system ability 
to generate configurations that increase the interaction 
term in the weight function H(S,t) defined in Eq. o). 
When the first term increases, the second term fiN[t) 
in Eq. (El) can increase as well, while the total H(S,t) 
remains on, average, fixed. 

The increase in the interaction term of H(S,t) is 
achieved in several ways. Firstly, the population is spread 
out onto an increasing number D(t) of different geno- 
types, as seen in Fig. ||. Moreover, the evolutionary 
dynamics tends to produce occupied sites which are in- 
teracting with an increased number of other occupied 
sites, i.e., the number of non-zero terms </(S a , S)n(S) in 
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FIG. 7: The number of occupied positions in genotype space 
with a given number of active links connected to other occu- 
pied position in genotype space. The solid line is after 500 
generations, the dashed line is recorded after 10 5 generations. 
Parameters are c = 0.01, p = 0.01 and Lp mu t = 0.2. 



FIG. 8: The probability density p(r) of the logarithmic wait- 
ing time r. The top graph is a double logarithmic plot of p(r) 
exhibiting a power law behavior in the region of small r val- 
ues. The bottom plot is a linear-log plot of the same data. 
Here one sees that the behavior of p(r) for large values of r 
is consistent with a slow exponential decay. Parameters are 
as for Fig. Fl 



H(S a , t) in Eq. (|l]) grows as the system produces config- 
urations that are able to benefit better from the possible 
mutual interactions represented by J(S Q , S). The distri- 
bution of active interaction links is shown at an early and 
a much later time in Fig. (?]. We have not been able to 
resolve a shift with time in the distribution of the values 
of the active interaction strengths J(S a ,S b ). 

The increase in the diversity and the number of ac- 
tive links connected to an occupied position can be inter- 
preted as an increase in the complexity of the configura- 
tions produced by the evolutionary dynamics. Selection 
and adaptation operate at the level of the entire con- 
figuration in genotype space rather than at the level of 
individual genotypes. This highlights that the biological 
concept of fitness makes most sense when considered as 
a collective property of an ecology, rather than an observ- 
able characteristic of the individual species or individual 
members of a population. 



C. Record Statistics 

The observation in the previous section that the dy- 
namics of the TaNa model leads to an increase in a num- 
ber of measurable quantities, taken together with the 
intermittent nature of the dynamics, suggests that the 
transitions between consecutive q-ESS epochs correspond 
to record transitions. One can imagine that some char- 
acteristic measures of the collective level of adaptation of 
the configurations generated in genotype space achieves 
an ever increasing value as the system undergoes a tran- 
sition from one q-ESS to the next. 

Sibani and collaborators |25|, |2(| ^7], |29| have stud- 
ied record dynamics and shown that the probability for 
n records in a sequence of t independently drawn ran- 



dom numbers is Poisson distributed on a logarithmic time 
scale, or equivalently: that the logarithm of the ratio of 
the time between the fc-th and the (k — l)-th record, 
Tk = ln(tfc /tfc-i) , is exponentially distributed: P(t > 
x) = cxp(— Xx). Sibani and coworkers |2^, p7|, |2|] 
have also demonstrated the relevance of record statistics 
to the dynamics of the Kauffman NK model |l6|, |l^] . 

Accordingly, it is interesting to investigate if the ag- 
ing observed in the TaNa model exhibits signs of record 
statistics. To do this, we study the distribution of the 
variable = h\(tk/tk-i) where tk denotes the time 
at which the fc-th transition between consecutive q-ESS 
epochs occurs. We show in Fig. ||that is exponentially 
distributed for large values and algebraically distributed 
for small values of Tfc. 

The exponential tail in Fig. ||| suggests that the tran- 
sition times in the TaNa model follow record statistics in 
the region of large r values, corresponding to the regime 
of long q-ESS durations. The algebraic form of p(r) for 
small r values indicates significant correlations for tran- 
sitions that occurred in rapid succession. The question 
is which quantity evolves according to record dynamics. 
We have so far not been able to identify a variable of the 
system which jumps monotonously to ever higher values 
at the transition times. In the evolution models stud- 
ied by Sibani et al. |2j| ^6|, |2?], the fitness increases 
through consecutive records. As mentioned above in the 
TaNa model the reproductive fitness Poff remains, on 
average, constant. The increase in the average duration 
of the q-ESS (see Fig. |J) suggests that the stability of 
the configurations in genotype space gradually increases. 
To explore this, one should study the temporal behavior 
of the eigenvalue spectrum of the stability matrix of the 
effective evolution equations in (|J). We expect that the 
number of unstable directions, on average, decreases with 
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FIG. 9: The species abundance distribution. The peak in 
the distribution is compared with the lognormal form (dashed 
curve). Parameters are c = 0.01, fx = 5 • 10 -5 and Lp mu t = 
0.2. 



time, though for a given realization fluctuations probably 
prevent a strictly monotonous behavior. 
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FIG. 10: Run with mixed population. Top frame contains 
the size of the total population (solid curve), the asexual pop- 
ulation (dotted curve) and the sexually reproducing subpopu- 
lation (dashed curve). The frame below shows the occupation 
of the genotypes for the sexually reproducing subpopulation 
(top half) and the asexual subpopulation (bottom half). Pa- 
rameters are c = 0.08, fi = 0.001, Lp m ut = 0.3 and d m ax = 1. 



V. SPECIES ABUNDANCE FUNCTION 

The fundamental quantity to describe an ecology is 
the species abundance function J33|. The species abun- 
dance W(p) is the ratio W of species which contains a 
ratio p of the total population. We plot in Fig. ^ the 
species abundance function for the TaNa model during a 
q-ESS, in this case we use the term species to denote in- 
dividual positions in genotype space. A large number of 
positions are occupied by a small number of individuals; 
the occupancy of these positions is never established for 
extended periods during the q-ESS. The robust species 
contain a reasonable number of individuals and are dis- 
tributed according to the broad peak. The peak can be 
fitted by a log-normal curve in a way similar to observed 
species abundance functions, see e.g. |3j|. We note that 
comparable species abundances functions are found in 
the predator-prey model studied by McKane, Alonso and 

Sole ifnl. 



VI. COMPETITION BETWEEN SEXUAL AND 
ASEXUAL REPRODUCTION 

We now consider the competition within the TaNa 
model between sexual and asexual reproduction in sim- 
ulations of mixed populations. This is done by adding 
an extra reproduction determining gene to the genome 
(making L = 21 in this section). This extra gene does 
not explicitly enter in H(S,t) but dictates an individ- 
ual's reproductive mode. Mutations to this gene occur 
during reproduction in the normal way. Obviously, the 
simplistic nature of the TaNa model excludes the model 



from realistically representing all biological features of 
the difference between sexual and asexual reproduction. 
Perhaps the most essential difference between sexual and 
asexual reproduction is the reshuffling of genes caused 
by the crossing over and recombination involved in sex- 
ual reproduction 34 . It is possible that this effect can 



be captured by our definition of sexual reproduction for 
dmax > 1. To make the two reproductive modes as sim- 
ilar as possible in all aspects, except for the mixing of 
parent genes in the sexual case, we redefine in this sec- 
tion slightly the reproduction procedures in the following 
way. The only difference compared with the definitions in 
Sec. [I B is that an asexual reproducing individual pro- 
duces one new individual and we leave the parent in the 
system. In sexual reproduction the reproduction proba- 
bility is assumed to be p t / of the first of the two selected 
parents. 

According to Weismann, sexual reproduction is more 
efficient to adapt to changed conditions because recom- 
bination produces a larger variation of types upon which 
Natural Selection is able to act, see Burt's excelent re- 
view J34| . It follows that in a harsher environment, sexual 
reproduction will be superior to asexual; a phenomenon 
which is observed |34j. The TaNa model in its present 
form is not entirely able to account for these facts. It is 
informative, nevertheless, to investigate the behavior of 
the model and to understand why it fails in this particu- 
lar respect. 

In Fig. [h] we show the total size of the population 
together with the sizes of the sexual and asexually repro- 
ducing subpopulations. We see that the asexual popula- 
tion is always largest and even more so during the q-ESS. 

In Fig. |ll| we show the ratio of the average size N a of 
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VII. THE ERROR THRESHOLD 



<Na>/<Ns> 




12 dmax 



FIG. 11: The ratio of the average number of asexual and 
sexually reproducing individuals as a function of d ma x and p. 
Parameters are as in Fig. juj 



the asexual reproducing population and the average size 
N s of the sexually reproducing population. One notices 
that the asexual population is always more numerous, 
but less so for large d max and low fi. That is the region 
where recombination is most important and the environ- 
ment (modeled by fi) least friendly. So in this respect 
the TaNa model exhibit a trend in the right direction in 
the sense of Weismann, see Burt's review [ |34| . 

The reason the sexually reproducing population is un- 
able to out-compete the asexual in the TaNa model is 
easy to understand. The offspring produced by two sex- 
ually reproducing individuals is likely to end up at a posi- 
tion in genotype space different from the parent positions. 
This effect makes it difficult to maintain the occupancy 
of the parent positions, even if these positions are highly 
fit. It is obviously this variation that is expected to make 
sexual reproduction more efficient in searching genotype 
space. However, in its present form, the TaNa model al- 
lows very large variations in the weight function H in Eq. 
(|l|), even for positions very close by in genotype space. 
Whether this is reasonable or not depends on the inter- 
pretation of the genomic sequence S. If we think of each 
"gene" Si as representing, in an averaged way, a section 
of length 1/L of the entire genome, then a separation by 
Hamming distance one is very significant for L as small 
as 20; and it is reasonable that two positions in geno- 
type space of this or larger separation may have very 
different weight functions H . On the other hand, this 
interpretation means that two individuals being different 
by a single Hamming distance differ by a major fraction 
of the entire genome and should therefore not belong to 
the same species and accordingly not be able to produce 
any (viable) offspring. We conclude that a more realistic 
study of the competition between sexual and asexual re- 
production by use of the TaNa model should be possible 
if a more slowly varying weight function is used. Future 
studies will focus on this problem. 



At sufficiently large mutation rates p m ut , offspring are 
so different from their parents that the occupation in 
genotype space rapidly moves from one position to the 
next. When this happens, it becomes impossible to es- 
tablish the q-ESS seen in Fig. ^|and consequently the en- 
tire simulation consists of one hectic period. The change 
from the behavior depicted in Fig. |^, where the hectic 
periods are of much shorter duration than the q-ESS, to 
the behavior where the q-ESS are absent, occurs over a 
very narrow region ofp mut values. Considering first large 
values of p mu t we gradually decrease p mu t in the simula- 
tions and we identify the threshold value, pth of p m ut at 
which q-ESS are observed as the error threshold [0, ||. 
In Fig. [l^ we plot the simulated value of p t h for different 
values of the width parameter c. 

We can estimate the c dependence of p t h by the follow- 
ing argument. From Fig. |J we know that the distribution 
of p f f in the hectic periods is centered about pkm and 
in the q-ESS is centered about p q -ESS defined in Eq. 
(||). Changing the parameter c will change the width of 
the distribution of the p ff values (see Eqs. (Q) and Eq. 
(Q)). It will be possible to establish q-ESS in between the 
hectic periods if pkui +c P < Pq-ESSi where a p is the half 
width of the peak in the distribution of p D f / in the hectic 
periods. We translate this argument to the distribution 
of the H values and obtain the following estimate for p t h- 



Pth 



1 - 2- 1 / i [(l -p M ,K Q/c + 1 +Pkm] 1/L - (5) 



Here we have assumed that the width a p of the peak 
in the distribution of H values will be given by a p — 
a/c (see Eq. ([!])) in which case a is a measure of the 
standard deviation of the factor in Eq. ([!]) multiplying 
1/c. We have used a = 0.07 to fit the simulation data 
in Fig. 1^. This value is somewhat larger but of the 
right order of magnitude as the corresponding quantity 
measured during the simulation. 



VIII. 



PARAMETER DEPENDENCE 



For completeness, we present here the dependence on 
the parameters c and p in the Hamiltonian given in Eq. 
(0). 

We show in Figs. |l^ and |l4| the averaged occupation 
measured as the ratio between the average number of in- 
dividuals and the average number of occupied positions in 
genotype space for purely asexual and sexual reproducing 
populations respectively. As expected, the system is able 
to support the largest populations in the region of small 
H parameter and broad distribution of coupling strength, 
i.e., small values of c. The sexual reproduction is most 
sensitive to a decrease in the carrying capacity (increase 
in fi) or a decrease in the width of the range of possible 
J(S a ,S b ) couplings (increase in c). 
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FIG. 12: The loss of q-ESS occurs for mutation rates above 
the circles. For comparison, the theoretically predicted error 
threshold pJJ uj (c) is shown for a = 0.07 (see main text). The 
carrying capacity parameter is /j, = 0.005. 



<N>/<D> 




FIG. 14: The same data and parameters as in Fig. |l3| ex- 
cept that the data in this figure is for a sexually reproducing 
population. 



<N>/<D> 




FIG. 13: The ratio between the average size of the population 
and the average diversity as function of the width parameter 
c and the physical environment parameter fi. The data are 
for a system with asexual reproduction with Lp mu t = 0.2. 



IX. 



DISCUSSION AND CONCLUSION 



The Tangled Nature model may be considered as a 
mathematical framework for the study of evolutionary 
ecology. The dynamics of the model is denned at the 
level of individuals, cither as asexual or as sexually repro- 
ducing individuals. All ecological structure in the model 
arises through emergence. The model is able to gener- 
ate many of the observed features of biological evolution 
starting from a basic implementation of the key assump- 
tions in the Darwinian evolution paradigm. 

The density of individuals in genotype space segregates 
corresponding to the emergence of distinct species. The 
interaction between individuals gives rise to a jerky or 
intermittent macro-dynamics, in which quasi-stable con- 
figurations (the q-ESS) in genotype space are abruptly 



replaced by new quasi-stable configurations. This mode 
of operation can be compared with the intermittent be- 
havior observed in the fossil record and emphasized be 
Gould and Eldredge's the term "punctuated equilibrium" 
0. The TaNa model is always in a transient where the 
configurations generated as a result of adaptation to the 
co-evolutionary selective pressure gradually produce con- 
figurations, or ecologies, in genotype space which collec- 
tively exhibit a higher degree of adaptation, in the sense 
that the average lifetime of these q-ESS increases slowly. 
This behavior compares well with the observation that 
the fossil record indicates a decrease in the extinction 
rate [Q. The increase in the lifetime of the q-ESS is as- 
sociated with an increase in the complexity (species di- 
versity, number of active interactions) of successive con- 
figurations. This gradual aging together with the inter- 
mittent nature of the dynamics suggest that some char- 
acteristic of the evolving ecosystem might be undergoing 
record statistics in the sense of Sibani and co-workers 
|p5| , 26 , |27], So far we have unfortunately not been 
able to identify the appropriate effective variable which 
moves through the records but we expect this variable to 
be related to the stability matrix of the effective dynam- 
ical equation. 

The species abundance distribution generated by the 
TaNa model encourage future studies of larger popula- 
tions with longer genome sequences. This will enable a 
hierarchical study of the taxonomic organisation of the 
generated ecologies. Using distance criteria in genotype 
space one can study the clustering of individuals into 
species, of species into genera, etc. Future studies will 
also examine the phylogenetic structures in detail, espe- 
cially during the radiation of species encountered in the 
transition periods between q-ESS. Using longer genome 
sequences and a more smoothly varying weight function, 
we expect the TaNa model to be able to illuminate the 
evolutionary competition between sexual and asexual re- 
production. 
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